Goto

Collaborating Authors

 minima stability


The Implicit Bias of Minima Stability: A View from Function Space

Neural Information Processing Systems

The loss terrains of over-parameterized neural networks have multiple global minima. However, it is well known that stochastic gradient descent (SGD) can stably converge only to minima that are sufficiently flat w.r.t.


The Implicit Bias of Minima Stability: A View from Function Space Supplementary Material

Neural Information Processing Systems

This document contains supplementary material for the article'The Implicit Bias of Minima Stability: A View from Function Space', and includes the following parts: I. Experimental details and additional experiments II. Switching system formulation for single hidden layer ReLU networks VIII. Generalization of Lemma 4 to global minima that are not twice-differentiable XI. For the experiment in Sec. 5 we generated a different dataset of This learning rate warm-up was used in all training runs. Specifically, we used PyTorch's standard initialization, and multiplied it by different factors.


The Implicit Bias of Minima Stability: A View from Function Space

Neural Information Processing Systems

The loss terrains of over-parameterized neural networks have multiple global minima. However, it is well known that stochastic gradient descent (SGD) can stably converge only to minima that are sufficiently flat w.r.t. In this paper we study the effect that this mechanism has on the function implemented by the trained model. First, we extend the existing knowledge on minima stability to non-differentiable minima, which are common in ReLU nets. We then use our stability results to study a single hidden layer univariate ReLU network.